Integration of deep learning with Ramachandran plot molecular dynamics simulation for genetic variant classification

Summary Functional classification of genetic variants is a key for their clinical applications in patient care. However, abundant variant data generated by the next-generation DNA sequencing technologies limit the use of experimental methods for their classification. Here, we developed a protein structure and deep learning (DL)-based system for genetic variant classification, DL-RP-MDS, which comprises two principles: 1) Extracting protein structural and thermodynamics information using the Ramachandran plot-molecular dynamics simulation (RP-MDS) method, 2) combining those data with an unsupervised learning model of auto-encoder and a neural network classifier to identify the statistical significance patterns of the structural changes. We observed that DL-RP-MDS provided higher specificity than over 20 widely used in silico methods in classifying the variants of three DNA damage repair genes: TP53, MLH1, and MSH2. DL-RP-MDS offers a powerful platform for high-throughput genetic variant classification. The software and online application are available at https://genemutation.fhs.um.edu.mo/DL-RP-MDS/.


INTRODUCTION
Next-generation DNA sequencing technologies allow the collection of a massive quantity of genetic variation data at the population level, with the majority as single base variants. Although identifying the genetic variants causing apparent damage in protein structure can be straightforward, determining the functional impact of missense variants, in which a single base variant causes an amino acid change in a protein, is challenging as they mainly affect local rather than global protein structure. Currently, a large quantity of missense variants identified in the human genome remains unclassified. 1,2 For example, of the 56,483 missense variants identified in 170 DNA damage repair (DDR) genes, 50,427 (89.3%) remain as variant of uncertain significance (VUS). 3 The lack of functional information for the genetic variants limits their clinical applications. 4 Although many in silico tools have been developed with the aim of determining the functional impact of missense variants, the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG-AMP) guidelines conclude that the accuracy of these methods remains in question. 5 From the atomistic point of view, the functionality of a protein is determined by its structure maintained by intramolecular and intermolecular interactions through electrostatic, hydrogen bonding, Van der Waal, etc. As such, the impact of a genetic variant on protein function can be reflected by its impact on protein structural stability. In our previous study, we developed the Ramachandran plot-molecular dynamics simulation (RP-MDS) method to measure the impact of missense variants on protein structure. In the process, the torsion angle phi (4) and psi (J) of the protein secondary structural backbone are assimilated throughout MD trajectories. The alteration of backbone information reflects the impacts of the altered residue on protein structure. 6 Applying RP-MDS, we were able to classify multiple TP53 VUS. 6,7 However, there are several limitations in RP-MDS, including the need to manually define the cut-off value in order to separate between deleterious and non-deleterious variants, the difficulty to analyze the genes with insufficient known benign and pathogenic variants as the training data, and the challenge to measure minor structural changes often masked within the statistically averaged values.
Deep learning (DL) is increasingly applied in molecular biology studies. [8][9][10] We hypothesized that the integration of DL with RP-MDS may significantly increase the power of RP-MDS for genetic variant classification.

RP-MDS for classifying missense variants
We first used the data generated from the known benign and pathogenic variants to determine the cut-off values between the deleterious and non-deleterious variants. Lognormal distribution was fitted against the benign and pathogenic distribution ( Figure 2). Kolmogorov-Smirnov and Anderson-Darling goodness-offit test accepted lognormal distributions for all genes ( Figure 2D). 24,25 RP-MDS was able to classify benign and pathogenic variants in each gene as the benign variant distribution curves differed significantly from the distribution of the pathogenic variants. We determined the optimal cut-off points as 3.17 (true negative, TN = 58.3%; true positive, TP = 58.0%) for TP53, 3.38 (TN = 75.0%, TP = 71.1%) for MLH1, and 3.36 (TN = 66.7%, TP = 65.8%) for MSH2. The results for individual variants based on the settled cut-off positions were obtained (Tables S2 and S3).

DL-RP-MDS for classifying missense variants
The Ramachandran scatter plots (RSP) of benign and pathogenic variants generated by molecular dynamics simulation (MDS) were directed into AE (see Figure 1). The optimized hyperparameter configuration for the classifier was a fully connected neural network with one hidden layer of 1024 neurons and without dropout, together with a latent representation dimension of q = 14 for TP53, q = 8 for MLH1 and q = 20 for MSH2. The values were chosen based on the validation accuracy z95%. These various latent representation dimensions were employed to characterize the relationship between known benign and pathogenic variants ( Figures S1-S3). Examples of graphical illustrations for the latent dimensions were shown in Figure 3. Benign variants occupied a localized region and partially overlapped with pathogenic variants, which represented structural features observed in the benign and pathogenic variants. In contrast, the unique regions occupied by the pathogenic variants were interpreted as localized distinct protein deformities caused by the missense variants. For each variant, probabilities of ''deleterious, D'' and ''unknown, U'' were assigned (Table S2). Variants used as part of the training data were tested by the method again and scored with high probability (>90%) in their respective classifications. Incorrectly identified variants (false negative, FN; false positive, FP) by DL-RP-MDS with probabilities close to $50% implied that the protein structure was at the threshold of benign/pathogenic transitions.

Stratified cross-validation results
RP-MDS underwent a four-fold stratified cross-validation test with five repeats using the TP53, MLH1, and MSH2 variants, and their receiver operating characteristic (ROC) curves were computed. Each model was fitted against the lognormal distribution and was accepted by the Kolmogorov-Smirnov (K-S) and Anderson Figure S4). Although the training dataset outperformed RP-MDS, the testing dataset was only marginally better than RP-MDS. This grouping method was used to have an objective comparison with RP-MDS; thus, the model appeared to be overfitted. Although the lack of benign variants contributed to the low testing score for DL-RP-MDS, significant improvements in the testing data were achieved when using the strategy of ''grouped by frames'' ( Figure 4). This grouping method provided a better description of DL-RP-MDS operation, as it treated each RSP as an individual. DL-RP-MDS achieved 1.00 and 1.00 for the testing and training data across TP53, MLH1 and MSH2. The results of the four-fold stratified cross-validation demonstrated that DL-RP-MDS models were not overfitted and performed better than RP-MDS.

Comparing RP-MDS and DL-RP-MDS with 22 in silico methods
Multiple in silico computational methods have been developed based on different principles, such as familial segregation, 26 evolution conservation, 27 classical statistics, 28 experiment assays, 29 and  (Tables 2 and S3). 41 We quantified each method by determining the sensitivity, which was calculated by the number of TP predictions divided by all the number of pathogenic samples, and the specificity, which was calculated by the number of TN predictions divided by all the number of benign samples. For the pathogenic variants, 21 in silico methods except PrimateAI had the sensitivity between 78 and 100%; BayesDel_addAF, FATHMM, M_CAP, REVEL, and BayesDel_addAF reached 100%, and PrimateAI had 25%, the lowest score among all 22 in silico methods. In comparison, DL-RP-MDS reached 95% sensitivity; for the benign variants, only two (PrimateAI, PolyPhen2_HVAR) out of the 22 in silico methods had specificity >70%, in which PrimateAI reached 93.2%, the highest among all 22 in silico methods specificity, the remaining 20 in silico methods showed specificity in the range of 0-70%. In comparison, DL-RP-MDS achieved 100% specificity, the highest over all the in silico methods.
The overall performance was summarized in Table 2  In contrast to the traditional view of the RSP, where the emphasis was on the residues in the disallowed region, RP-MDS directly measures the dynamical structure changes by RSP and determines the deleteriousness of the variants based on the average structure dynamics reflected in the last 10 ns of MDS. Deleterious variants were thought to have deleterious effects on the surrounding environment such that the averaging dynamics of the protein (i.e., converting RSP into Ramachandran density plot, RDP) may reveal the effects of the variant residue in the proteins. Thus, the variants with significant differences can be identified as ''deleterious''. However, RDP can dampen the scrupulous details of structural change but mainly conserve significant information. DL-RP-MDS used RSP as input to conserve the vast information of the residue and backbones. It provides a high capability to distinguish energy landscapes between deleterious and benign variants. The first part of the DL approach, AE, nonlinearly compresses the individual RSP frame into the latent space; 45 and the second part, the neural network classifier, performs the classification based on  iScience Article the latent space to provide a probability for benign and deleterious variants. Another benefit of DL-RP-MDS is that it provides a continuous value (between 0 and 1) for high-confident variant classification instead of an arbitrarily defined cut-off for binary classification. DL-RP-MDS substantially improves the accuracy of RP-MDS as reflected by its differentiation between benign and pathogenic variants at AUC of nearly 100% when each gene was evaluated individually. DL-RP-MDS substantially overcomes the problem of overprediction of deleterious by the current in silico methods. 46 51 The results from our study of the missense variants of these three genes demonstrate that the classification of missense variants by DL-RP-MDS needs to be gene-specific. This is different from the principles used by the 22 in silico methods tested in this study as they assume that the same principles apply to all genes. We consider that such an assumption ignores the structural differences between different genes, contributing to their lower accuracy. For example, SIFT and PolyPhen-2 use evolution conservation for variant classification. However, they are not be suitable to classify the missense variants in BRCA1 and BRCA2, as the pathogenic variants in BRCA1 and BRCA2 were mostly originated in recent human history rather than evolution conservation from other species. 52 On the contrary, DL-RP-MDS uses protein In summary, DL-RP-MDS provides a highly accurate tool for missense variant classification. It is readily applicable to classify the missense variants of unknown significance and unclassified missense variants.

Limitations of the study
Several limitations of DL-RP-MDS suggest future studies to advance the prediction of missense variants: 1) The limited number of benign variants may skew the variant test toward more deleterious identification, even in the presence of SMOTE. 2) Not all deleterious variants change protein structure. Therefore, DL-RP-MDS may not be able to characterize these variants. 3) Interpretation of DL results can be elusive because of the ''black box'' issue discussed in ref. 53 , although DL-RP-MDS avoids ''black box'' interpretation as phi-psi angle based on protein backbone. 4) DL parameters are subject to fine-tuning. Although layers and dimensions were tuned specifically for TP53, MLH1, and MSH2, these parameters can be protein specific. The accuracy of the prediction may be subject to variants classified by different resources (i.e., different databases) under distinct classification criteria. 5) DL-RP-MDS uses MDS to observe structural changes within the protein, which does not account for external molecules and therefore does not provide information or properties such as LOF or gain of function. Furthermore, only fragments of the protein domains were experimentally purified and crystallized, making the MD simulations and the analyses limited to the structures available in the PDB database. 54

DECLARATION OF INTERESTS
The authors declare no competing interests.  where X is the input (standardized) atomic configuration (4 and J angles) at time t to be encoded, X* is the torsional configuration of the trained data at time t + 1, 45 k$k 2 denotes the L 2 norm and N input data points. The weights and biases were initialized randomly using the Glorot uniform initialization method, and the optimization was performed using Adam's algorithm with a learning rate of 0.001. 69 For the AE architecture and design, we followed the procedure in ref. 45 ; we used the Leaky-Rectified Linear Unit (ReLU) as the activation function s and implements two hidden layers (M = 2) with 1,000 nodes between the input and latent layers and dropout rate of 0.1.

Multi-label classification: Multi-layer neural network classifier
After encoding the complex torsional configurations from the MD simulations into the latent representation using the time-lagged AE, we performed classification for the data as deleterious or undefined function. We employed a fully connected neural network model N : R q /½0; 1 2 with M 00 hidden layers as the multi-label classification model, which can be expressed as L = N ðHÞ = À s 00 M 00 + 1 + W 00 where the model took the standardized latent representation from the AE as the inputs and mapped the inputs to two labels representing the probability of the variant being a deleterious or undefined function. We used the Leaky-ReLU as the activation function for the hidden layers (s m for m % ''M'') and the sigmoid activation function for the output (s M''+1 ). As the data for the classification were imbalanced due to the smaller number of Benign variants than that of pathogenic variants, SMOTE was employed to enhance Benign variants data. 15 The neural network classifier was trained as described for the AE. We used the cross-entropy as the loss function: where L Ã is ground-truth labels. The weights and biases were initialized randomly using the Glorot uniform initialization method, and the optimization was performed using Adam's algorithm with a learning rate of 0.001. Bayesian optimizations were performed for hyperparameter tuning for the neural network classifier to determine the best architecture, combined with a grid search for the number of latent dimensions.

QUANTIFICATION AND STATISTICAL ANALYSIS F 2 score validation and accuracy test
We use the F 2 score of the validation dataset as the objective function of the hyperparameter tuning:

Statistical analysis and cross-validation test
A four-fold stratified cross-validation test with five repeats was performed to assess the sensitivity and specificity of RP-MDS and DL-RP-MDS for all datasets. 70 In brief, the benign and pathogenic Ramachandran plots were randomly permuted and divided into four groups. In each run, one of the groups was selected iScience Article as the testing dataset and the remained groups were used as the training dataset for model training. A total of 20 models were created for RP-MDS and DL-RP-MDS, and the performances were represented by the ROC curve and AUC. The ROC curve for RP-MDS was generated by shifting the cut-off with an equal bin size of 0.01, and the ROC curve for DL-RP-MDS was calculated by using python library Scikit-learn.
Further, we used the balanced accuracy (BA) to calculate the average true negative rate (TNR) and true positive rate (TPR) of the variants classified by the model: balance accuracy = TPR + TNR 2 (Equation 8) The TPR and TNR values for RP-MDS were calculated based on the optimal cut-off value, and TPR and TNR values for DL-RP-MDS were calculated based on the results generated from the model.

ll
OPEN ACCESS